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Novel Procollagens 

The present invention concerns novel molecules, in particular novel 
procollagen molecules, together with collagen molecules, fibrils and fibres comprising a 
non-natural combination of collagen a-chains, DNA encoding same, expression hosts 
transformed or transfected with same, transgenic animals and methods of producing a non- 
natural collagen. 

Collagen (also known as processed procollagen molecule and triple helical 
processed procollagen monomeric molecule) (for general reviews see Kadler, K., 1995, 
Protein Profile, ''Extracellular Matrix 1: fibril-forming collagens", 2: 491-619; Ayad, S. et 
aL, 1994, The Extracellular Matrix Facts Book, Academic Press, London, ISBN 0-12- 
068910-3 and references therein) is a major stmctural protein in animals where it occurs in 
the extracellular matrix (ECM) of connective tissues, mostly in the form of fibrils (also 
known as polymeric collagen). The collagen fibrils (polymeric collagen) are the major 
source of mechanical strength of connective tissues, providing a substratum for cell 
attachment and a scaffold for dynamic molecular interactions. The family of collagens 
comprises complex multidomain proteins comprising three collagen a-chains wound into 
a triple helix. At least twent>' genetically-distinct collagen t>^pes have been described to date 
and they can be classified into subgroups on the basis of gene homology and function of the 
encoded protein. Fibril-forming collagens (types I, II, III, V and XI; see Table 1) are 
synthesized as soluble procollagens (also known as proa chains, procollagen a-chains and 
monomer chains) and comprises a C-propeptide, a Gly-X- Y repeat containing region (which 
in the case of monomer chains of fibril-forming collagens comprise an uninterrupted 
collagen a-chain) and an N-propeptide. The proa chains trimerise to form unprocessed 
procollagen molecules (also known as monomeric procollagen molecules and trimerised 
proa chains ), assembling into fibrillar structures upon en2>'mic cleavage of their N- and C- 
terminal propeptide domains (the N- and C-propeptides) (see Figure 1). 
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Although the genes encoding the proa chains are remarkably similar, 
relatively little is known about the processes which control the folding and trimerization of 
the proa chains (Dion, A.S. and Myers, J.C., 1987, J. Molec, Biol., 193: 127-143), and only 
a restricted range of coUagens is formed. For example, skin fibroblasts synthesise co- 
incidentally the six highly homologous proa chains (proa 1(1), proa 1 (III), proal(V), 
proa2(I), proa2(V) and proa3(V)). Despite the great number of possible combinations of 
the six proa chains, only specific combinations of collagen chains occur - these are those 
resulting in types I, III and V collagen. Type I collagen exists as a heterotrimer and 
assembles with the stoichiometry of two proa 1(1) chains and one proa2(I) chain 
([proal(I)]2 proa2(I)). Homotrimers of proa2(I) have not been detected and hence the 
inclusion of this chain in a trimer is dependent upon its association with proa 1(1) chains. 
Type III collagens comprise a homotrimer ([proa 1(III)]3), and the constituent chains do not 
assemble with either of the Type I collagen proa chains. Type V collagen displays 
heterogeneity with regard to chain composition, forming both homo- ([proa3(V)]^) and 
hetero-trimers ([proal(V)]. proa2(V) and [proal(V) proa2(V) proa3(V)]). 

The C-propeptide is known to be implicated in the assembly of the monomer 
chains into trimerised proa chains (unprocessed procollagen) prior to cleavage of the N- and 
C-propeptides and formation of collagen in fibril-forming proa chains. The assembly of the 
three monomer chains into trimerised proa chains is initiated by association of the C- 
propeptides. This association can be divided into two stages: an initial recognition event 
between the proa chains which determines chain selection and then a registration event 
which leads to correct aligmnent and folding of the triple helix. Comparison (Figure 2) of 
the amino acid sequences of the C-propeptides of proa 1(1), proa2(I) and proa 1(111) proa 
chains, which assemble to form collagen types I and III, demonstrates the striking level of 
sequence similarity between these proa chains yet. despite the homolog>% they invariably 
assemble and fold in a collagen type-specific manner. 
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It has now been found that the C-propeptides, and more particularly certain 
sequences within them, are not only necessary but are also sufficient to determine the type- 
specific assembly of the moieties to which they are attached, because of the presence of 
these certain sequences, the C-propeptides are capable of autonomously directing the 
assembly of the attached moieties, which in particular may be an alien collagen a-chain. 
The present inventors have isolated and characterised a region of the C-propeptide which 
defines the chain selection event but which does not affect the subsequent folding. This has 
allowed the synthesis of novel proa chains which have formed novel trimerised proa chains 
and collagen. Now that the chain selection interactions between the proa chains can be 
controlled, a vast range of novel trimeric molecules, in particular collagens. may be 
synthesised at will using existing and novel proa chains and C-propeptides. These new 
molecules may possess selected biological and physical properties and have a wide range 
of uses. For example, novel collagens may be used in industries which use collagen either 
as a product or as part of a process. Such collagens and uses may include for example: novel 
gelatins for use in food, food processing and photography; novel finings for clearing yeast 
during the brewing process; novel gelatins for the food packaging industry; novel polymers 
for the manufacture of textiles; novel glues for use in construction, building and 
manufacturing; novel coatings for tablets: novel glues for use with the human or animal 
body; novel collagens for use as body implants; novel collagens and procollagens as 
adjuvants; novel collagens and procollagens as molecular carriers for drugs and 
pharmaceuticals; and as modulators of collagen fibril formation for use in. for example, 
wound healing and fibrosis. 

According to the present invention there is provided a molecule comprising 
at least a first moiet>' having the activit>' of a procollagen C-propeptide and a second moiety 
selected from any one of the group of an alien collagen a-chain and non-collagen materials, 
the first moierv' being attached to the second moietv. 
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The molecule may be able to bind to other similar molecules. It may trimerise 
with other similar molecules. 

The first moiety will generally be attached to the C-terminal end of the second 
moiety, although intervening amino acid residues may be present. 

The first moietv' may comprise a proa chain C-propeptide or a partially 
modified form thereof or an analogue thereof, and when forming the C-terminal region of 
a proa chain, may allow the molecule to bind to other similar molecules. The C-propeptide 
region of a proa chain may be the C-terminal fragment resulting from C-proteinase 
cleavage of a proa chain. The C-proteinase may cleave between the residues G and D or A 
and D or an analogue thereof in the sequence FAPYYGD (residues 376-382 of SEQ ID NO: 
2), YYRAD (residues 1-5 of SEQ ID NO: 14) or FYRAD (residues 284-288 of SEQ ID 
NO; 1) (Figure 2) or an analogue thereof. 

Modifications to molecules may include the addition, deletion or substitution 
of residues. Substitutions may be conservative substitutions. Modified molecules may have 
substantially the same properties and characteristics as the molecules form which they are 
derived. Modified molecules may be homologues of the molecules from which they are 
derived. They may for example have at least 40% homology, for example 50%, 60%, 70%, 
80%, 90% or 95% homology. 

The present inventors have isolated and identified (see "Experimental'* 
section) a site - the recognition site - in the procollagen C-propeptide which contains a 
sequence which is necessary and sufficient to determine the type-specific assembly of the 
moieties to which it is attached. The recognition site is defined as being the pan of the C- 
propeptide containing the sequence (the recognition sequence) which, in an alignment plot 
of the C-propeptide against other C-propeptides, corresponds to the sequence in the region 
between the junction points B and G (Figure 2). Alignment plots may be done using the 
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PILE-UP program on SEQNET at the Daresbur\^ Laboratones, UK. An existing proa chain 
which has been substituted at the recognition sequence and as a result has different 
properties or characteristics is considered to be a molecule comprising a proa chain C- 
propeptide and an alien collagen a-chain since the C-propeptide is novel, all collagen a- 
chains therefore being alien to it. 

As can be seen from Figure 2, the recognition sequences contain a region of 
homologous amino acids. Substitution to the conserved residues (see "Experimental" 
section below) has not disrupted chain selection nor has it prevented the formation of a 
correctly aligned helix, and so it appears that the conserved residues are not involved in 
chain selection. Hence the recognition sequence, although comprising a continuous 
sequence of about 23 amino acids, may be considered to have the chain selection properties 
contained within a discontinuous variable sequence. For example in the recognition 
sequence of alpha l(III) (SEQ ID NO: 6) the discontinuous variable sequence may be 
considered to comprise residues 1-12 and 21-23. 

The C-propeptide and/or the recognition sequence may be that of a fibrillar 
proa chain. More generally, the C-propeptide may be an existing C-propeptide, for example 
a C-propeptide found in naaire. or it may be a partially modified form of or an analogue (i.e. 
possessing substantially the same properties and characteristics but having a different 
sequence) of an existing proa chain C-propeptide, or it may comprise a novel C-propeptide 
(i.e. a C-propeptide having significantly different properties or characteristics to other C- 
propeptides) and may for example have different binding kinetics or a-chain selection 
properties. 

The existing C-propeptide may be selected from any one of the group of the 
proa 1(1), proa2(I), proUlI). proa 1 (III), proal(V), proa2(V). proa3(V), proa I (XI), proa- 
2(XI), and proa3(XI) proa chain C-propeptides or a partially modified form thereof or an 
analogue thereof Partially modified forms of procollagen C-propeptides include the 
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recognition sequences of C-propeptides, for example those identified in Figure 3 of the 
accompanying drawings in relation to the C-propeptides from which they are derived. In 
some embodiments of the invention, such modified forms may be the only, or substantially 
the only, elements derived from a C-propeptide, in other words, no other C-propeptide- 
derived sequences need be present. However, this will not always be the case, as the 
invention also encompasses the presence of other parts of the C-propeptide including, of 
course, the balance of it. 

The C-propeptide may comprise an existing C-propeptide or a partially 
modified form thereof or an analogue thereof substituted at the recognition site. The C- 
propeptide may be substituted at the recognition site with the recognition sequence of an 
existing C-propeptide, for example that of a different C-propeptide. It may for example be 
substituted at the recognition site with the recognition sequence of the C-propeptide of any 
one of the group of proal(III), proal(I), proa2(I), proal(II), proal(V), proa2(V), 
proa 1 (XI), proa2(XI) and proa3(XI) proa chains. It may be substituted at the recognition 
site with a recognition sequence having the sequence of any one of the group of SEQ ID 
NOS: 6-13. The recognition sequence of a C-propeptide which has been modified for 
example by addition, deletion or substitution of amino acid residues yet which has 
substantially the same properties and/or characteristics is considered to be essentially that 
of an existing C-propeptide. The recognition sequence may generally be at least 40% 
homologous, or even at least 50, 60, 70, 80, 90 or 95% homologous to the sequence from 
which it was derived. 

Such a substitution at the recognition site may significantly affect the 
propenies or characteristics of the C-propeptide 

Alternatively, the recognition sequence may be novel. Such a novel 
recognition sequence may for example give the first moiet\' novel binding kinetics or 
specificity for a novel first moiety or a novel set of first moieties. 
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The second moiety is a molecular component which may be anything bound 
to the first moiety. This may include, for example, alien collagen a-chain molecules, or 
other proteins or fragments of proteins, such as antibodies or antigen binding fragments 
thereof, or combinations thereof. Proteins constituting or contributing to the second moiety 
may be glycosylated or otherwise post-translationally modified. By "alien collagen a-chain" 
is meant a collagen a-chain which does not form part of a proa chain with the C-propeptide 
in nature; collagen a-chains comprise a triple helical forming domain, and an N-propeptide 
may also be present. Other collagen a-chains from the same species, as well as those from 
different species, may be used. Included as collagen a-chains which do not form part of a 
proa chain with the C-propeptide in nature are partially modified forms and analogues of 
existing collagen a-chains which form part of a proa chain with the C-propeptide in nature 
and which do not significantly affect the relevant properties or characteristics of the 
procollagen molecule, such as binding speciflcit\^ Partially modified forms and analogues 
of collagen a-chains may, for example, have additions, deletions or substitutions which do 
not significantly affect the relevant properties or characteristics of the C-propeptide or 
collagen a-chain. 

By means of the invention, therefore, novel collagens may be produced. Such 
novel collagens have combinations of a-chains which are not seen in nature because of the 
assembly-directing effect of the namral C-propeptides. The invention allows the protein 
engineer to construct novel collagens having a non-natural combination of a-chains. The 
invention therefore extends to a procollagen molecule comprising a non-natural 
combination of a-chains. Non-natural pro-collagen homotrimers and heterotrimers, 
including all the possible trimers not mentioned in Table I, are within the scope of the 
invention. 



The second moiety may comprise at least a collagen a-chain. A collagen a- 
chain may be selected from any one of the group of proa UI) chain. proa2(I) chain. 
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proa 1(11) chain, proa 1 (III) chain, proal(V) chain, proa2(V) chain, proa3(V) chain, 
proal(XI) chain, proa2(XI) chain, and proa3(XI) chain collagen a-chains. 

The second moiety may also comprise a proa chain N-propeptide. An N- 
propeptide may be selected from any one of the group of the proa 1(1), proa2(I), proa 1(11), 
proa 1 (III), proal(V), proa2(V), proa3(V), proa 1 (XI), proa2(XI), and proa3(XI) proa 
chain N-propeptides. 

The second moiety may comprise a collagen a -chain and N-propeptide which 
are naturally associated (for example those of the proa2(I) chain), or it may comprise a non- 
natural combination of collagen a-chain and N-propeptide. Depending upon the host 
organism in which it may be desired to express molecules of the invention, the N-terminal 
propeptide may be replaced or adapted to facilitate secretion or other handling or processing 
in the expression system. 

The molecule may comprises a first moiety having the activity of the 
proal(III) C-propeptide attached to a second moiety comprising the collagen a-chain and 
N-propeptide of the proa2(I) chain. The molecule may have the sequence of SEQ ID NO: 
4. 

In the natural formation of a collagen molecule in vivo, the N- and C- 
propeptides are cleaved off the procollagen molecule to yield a collagen molecule during 
the formation of polymeric collagen. Consequently, the invention includes within its scope 
a collagen molecule comprising a non-natural combination of a-chains. Non-natural 
collagen homotrimers and heterotrimers, including all the possible collagen trimers not 
mentioned in Table 1, are within the scope of the invention. (If for any reason it is desired 
to have a non-natural collagen molecule with a C-propeptide but not an N-propeptide, or 
vice versa, the enzymes responsible for processing in the chosen expression system may be 
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manipulated or selected accordingly or the sequence of the molecule modified to make it 
susceptible to enzymatic processing as appropriate.) 

Collagen molecules naturally self-assemble into collagen fibrils, which in turn 
aggregate to form a collagen fibre. Collagen fibrils and collagen fibres comprising collagen 
molecules as described above are therefore also contemplated by the invention. 

Molecules of the first aspect of the invention may be prepared by any 
convenient method, including peptide ligation and complete synthesis. It is preferred 
however, that the molecules be prepared by expression from a recombinant DNA system. 
For this purpose, and according to a second aspect of the invention, there is provided a 
DNA molecule, which may be in recombinant or isolated form, encoding a molecule as 
described above (particularly a non-natural procollagen a-chain). 

Recombinant DNA in accordance with the invention may be in the form of 
a vector. The vector may for example be a plasmid, cosmid or phage. Vectors will 
frequently include one or more selectable markers to enable selection of cells transfected 
(or transformed: the terms are used interchangeably in this specification) with them and, 
preferably, to enable selection of cells harbouring vectors incorporating heterologous DNA. 
Appropriate start and stop signals may be present. The vector may be an expression vector 
having regulator\^ sequences to drive expression. Vectors not including regulator\' 
sequences are useful as cloning vectors; and, of course, expression vectors may also be 
useful as cloning vectors. 

Cloning vectors can be introduced into E, coli or another suitable host which 
facilitate their manipulation. According to another aspect of the invention, there is therefore 
provided a host cell transfected or transformed with DNA as described above. Such host 
cells may be prokarvotic or eukaryotic. Eukarv'otic hosts may include yeasts, insect and 
mammalian cell lines. Expression hosts may be stably u-ansformed. Unstable and cell-free 
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expression systems may be used in appropriate circumstances, but it is unlikely that they 
will be favoured, at the present state of technology, for bulk production. 

DNA of the invention may also be in the form of a transgene construct 
designed for expression in a transgenic plant or, preferably, animal. In principle, the 
invention is applicable to all animals, including birds such as domestic fowl, amphibian 
species and fish species. The protein may be harvested from body fluids or other body 
products (such as eggs, where appropriate). In practice, it will be to (non-human) mammals, 
particularly placental mammals, that the greatest commercially useful applicability is 
presently envisaged, as expression in the mammary gland, with subsequent optional 
recovery of the expression product from the milk, is a proven and preferred technology. It 
is with ungulates, particularly economically important ungulates such as cattle, sheep, goats, 
water buffalo, camels and pigs that the invention is likely to be most useful. The generation 
and usefulness of such mammalian transgenic mammary expression systems is both 
generally, and in certain instances specifically, disclosed in WO-A-8800239 and WO- 
9005188. The p-lactoglobulin promoter is especially preferred for use in transgenic 
mammary expression systems. WO-A-94 16570 purports to disclose the production of 
human recombinant collagen in the milk of transgenic animals but contains no experimental 
details of such production having taken place. 

Expression hosts, particularly transgenic animals, may contain other 
exogenous DNA to facilitate the expression, assembly, secretion and other aspects of the 
biosynthesis of molecules of the invention. For example, expression hosts may co-express 
prolyl 4-hydroxylase, which is a post-translational enzyme important in the natural 
biosynthesis of procollagens, as disclosed in WO-9307889. 

DNA, particularly cDNA, encoding namral procollagen chains is known and 
available in the art. For example, WO-A-9307889, WO-A-94 1 6570 and the references cited 
in both of them give details. Such DNA forms a convenient starting point for DNA of the 
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present invention, which may be prepared by recombinant techniques from it. While in 
general terms DNA encoding a C-propeptide (or a minimal essential region from it) may 
simply be ligated to DNA encoding an alien collagen triple helical domain ( usually attached 
to DNA encoding the corresponding N-propeptide), in practice it is useful to use PCR-based 
techniques to effect the precise ligation. For example, PGR products flanking the junction 
region between the C-propeptide and the triple helical domain may be prepared and 
combined; an overlap extension reaction can then be carried out to } ield a PGR product 
which is a hybrid beuveen DNA encoding the G-propepiide of one procollagen chain and 
DNA encoding the triple helical domain (and the N-propeptide, usually) of another 
procollagen chain. 

The invention is in principle capable of accommodating the use of synthetic 
DNA sequences, cDNAs. full genomic sequences and "minigenes", which is to say partial 
genomic sequences containing some, but not all, of the introns present in the ftill length 
gene. Because of the large number of introns present in collagen genes in general, though, 
experimental practicalities will usually favour the use of cDNAs or. in some circumstances, 
minigenes. 

DNA in accordance with the invention can in principle be prepared by any 
convenient method involving coupling together successive nucleotides, and/or ligating 
oligo- and/or poly-nucleotides, including m vitro processes, but recombinant DNA 
technology forms the method of choice. 

Molecules of the invention may be useful in a method of treatment or 
diagnosis of the human or animal body. The invention therefore extends to molecules as 
described above for use in medicine. 

The molecule may be for use as an adhesive or implant. It may be for use in 
promoting the healing of wounds or fibrotic disorders with reduced scarring. It may be for 
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use in promoting the healing of chronic wounds. By ''wounds or fibrotic disorders'' is meant 
any condition which may result in the formation of scar tissue. In particular, this includes 
the healing of skin wounds, the repair of tendon damage, the healing of crush injuries, the 
healing of central nervous system (CNS) injuries, conditions which result in the formation 
of scar tissue in the CNS, scar tissue formation resulting from strokes, and tissue adhesion, 
for example, as a result of injury or surgery (this may apply to e.g. tendon healing and 
abdominal strictures and adhesions). Examples of fibrotic disorders include pulmonary 
fibrosis, glomerulonephritis, cirrhosis of the liver and proliferative vitreoretinopathy. 

For example in the inhibition of fibrosis, a novel collagen molecule or proa 
chain may be applied to a site of wounding or fibrosis, the novel collagen (or proa chain) 
inhibiting collagen fibril formation and thus fibrosis. The novel collagen or proa chain may 
for example have a shortened a -chain. 

DNA of the invention may be useful, in appropriate constmcts, in a method 
of gene therapy. It may be for use in the treatment of Osteogenesis Imperfecta (01), Ehlers- 
Danlos Syndrome (EDS), Stickler Syndrome, Spondyloepiphyseal dysplasia, 
Hypochondro genesis or Aonic Aneurisms. Mutations within collagen genes are the cause 
of most forms of OI, some forms of EDS and of some chondrodysplasias. In most cases the 
devastating effects of the disease are due to substitutions of glycine within the triple helical 
domain - the Gly-X-Y repeat containing region - for amino acids w^ith bulkier side chains. 
This results in triple helix folding being prevented or delayed with the consequence that 
there is a drastic reduction in the secretion of trimerised proa chains. The malfolded 
proteins may be retained within the cell, probably within the ER (endoplasmic reticulum), 
where they are degraded. As the folding of the C-propeptide is not affected by these 
mutations within the triple helical domain, C-propeptides from wild-rvpe as well as mutant 
chains associate and may be retained within the cell. The retention and degradation of wild- 
type chains due to their interaction with mutant chains amplifies the effect of the mutation 
and has been termed "procollagen suicide". The massive loss of protein due to this 
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phenomenon may explain the dominant lethal effects of such mutations. By engineering 
proa chains having altered chain selectivity, proa chains maybe produced which do not 
associate with the mutant chains, and will therefore fold and be secreted normally. Such 
engineered proa chains may contain the wild-type collagen a-chain, thereby making up for 
the deficit caused by the mutant collagen a-chain. Expressed protein may in some 
circumstances also be useftil in the treatment of diseases and conditions which could be 
addressed at a more fundamental level by gene therapy. 

The invention may also be useful in photography, brewing, foodstuffs, textiles 

or adhesives. 

Also provided according to the present invention is a method of treatment or 
diagnosis of the human or animal body comprising the use of a molecule according to the 
present invention. 

The invention will be further apparent from the examples, which comprise the 
following Figures and description of experiments, of which: 

Figure 1 shows the initial stages in the intra-cellular folding, assembly 
and modification of procollagen. As can be seen, co-translational translocation and signal 
peptide cleavage occurs at stage number 1. Intra-molecular disulphide bond formation then 
takes place as well as N-linked glycosylation, proline isomerisation and proline 
hydroxylation at stage 2. There then foUow^s at stage 3 type-specific assembly of the proa 
chains by trimerisation and inter-molecular disulphide bond formation. Finally, at stage 4, 
triple helix formation proceeds in a carboxy-to-amino-direction to give trimerised proa 
chains; 

Figure 2 shows an alignment plot made using the PILE-UP program on 
SEQNET at the Daresbur>' Laboratories, UK using default senings, of the C-propeptides 
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of proa chains of type I and type III procollagen. Alpha 1(1) is SEQ ID NO: 14; alpha 2(1) 
is residues number 284-534 of SEQ ID NO: 1; and alpha l(III) is residues number 379-626 
of SEQ ID NO: 2. C-proteinase cleavage sites (marked CP) are between A and D (alpha 
1(1)), A and D (alpha 2(1)) and G and D (alpha l(III)). Junction points A, F, B, C and G are 
as shown. Numbers indicate conserved cysteine residues. # indicates identical amino acids 
and - indicates amino acids with conserved side chains; 

Figure 3 shows recognition sequences for proa 1(1), proa2(I), proa 1(11), 
proal(III), proal(V), proa2(V), proal(XI) and proa2(XI) proa chains having SEQ ID 
NOs: 7, 8, 9, 6, 10, 1 L 12, and 13 respectively and which were identified by alignment plots 
of C-propeptides against other C-propeptides (specifically, those of Figure 2) as 
corresponding to the sequences in the regions between junction points B and G of Figure 
2; 

Figure 4 shows an SDS-PAGE gel of translated procollagen constructs. 
Lanes are as follows: 1 - molecular weight markers; 2 and 6 - proal(III)Al; 3 and 7 - 
proa2(I)Al; 4 and 8 - proa2(I):(III)CP; 5 and 9 - proa 1(III):(I)CP; 

Figure 5 shows an SDS-PAGE gel of translated procollagen constructs. 
Lanes are as follows: 1 - molecular weight markers; 2 and 7 - A-join; 3 and 8 - F-join; 4 and 
9 - B-join; 5 and 10 - C-join; 6 and 1 1 - recip-C-join; 

Figure 6 show^s translated procollagens in the presence (Lanes 3, 5 and 
7) and absence (Lanes 2, 4 and 6) of a,a'-dipyridyl. Lanes are as follows: 2 and 3 - 
proa2(I):(III)CP; 4 and 5 - BGR: 6 and 7 - proa 1(III):(I)CP; 

Figure 7 shows an SDS-PAGE gel of translated procollagen constructs 
under reducing (lanes 1-3) and non-reducing (lanes 4-6) conditions. Lanes are as follows: 
1 and 4 - BGR^"^; 2 and 5 - BGR; 3 and 6 - BGR^'^^ and 
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Figure 8 shows an SDS-PAGE gel of translated procollagen constructs. 
Lanes are as follows: 1 - molecular weight markers; 2 - BGR^ 3 - BGR; 4 - BGR^"^. 
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EXPERTMFNTAT 



A cDNA clone coding for a truncated proa2(I) chain (designated proa2(I)A I ; 
SEQ ID NO: 1; nucleic acid coding sequence - SEQ ID NO: 19) was constructed from two 
partial cDNA clones, pHfl 1 3 1 and pHp20 1 0 (Kuivaniemi. H.etal.,l 988, Biochem. J., 252: 
633-640) which were sub-cloned into the EcoRI site of pBluescript SK^ A 3.4 kb fragment 
PstI fragment containing the vector and the 5 '-terminal 0.5 kb of the gene was isolated from 
pHp2010 and ligated to a 1.4 kb PstI fragment derived from priflni encoding the 3' 
terminus. The resultant recombinant, proa2(I)Al. has a 2.2 kb deletion in the coding 
sequence (Lees and Bulleid, 1994, J. Biol. Chem., 2^: 24354-24360). 

This construct was analysed using a semi-permeabilised (SP) HT1080 cell 
system as described by Wilson etal. (1995, Biochem J. 307: 679-687). Semi-permeabilised 
cells were prepared from HT1080 cells. Confluent HT1080 cells from a 75 cm- flask were 
rinsed once with PBS (phosphate buffered saline), then incubated with 2 ml of PBS 
containing 2.5 mg/ml trypsin for 3 minutes at room temperature. The flask was transferred 
to ice where 8 ml of ice-cold KHM (110 mM KOAc, 20 mM Hepes, pH 7.2. 2 mM 
MgOAc) was added containing 100 ixwml soyabean trypsin inhibitor and the cells released 
from the plate. Cells were pelleted at 12.000 ipm for 3 minutes and resuspended in 6 ml of 
KHM containing 40 ^ig/ml digitonin (diluted from a 40 mg/ml stock in DMSO (dimethyl 
sulfoxide)) and incubated on ice for 5 minutes. To terminate permeabilisation 8 ml of KHM 
was added and cells were pelleted and resuspended in 50 mM Hepes, pH 7.2. 90 mM 
KOAc. After 10 minutes the cells were pelleted and resuspended in 100 ^\ of KHM 
(approximately 2x10* cells). Endogenous mRNA was removed by adding CaCl, to 1 mM 
and Staphylococcal nuclease to 10 ng/'ml and incubating at room temperature for 12 
minutes. The reaction was terminated by the addition of EGTA to 4 mM, and pelleting the 
cells. Semi-permeabilised cells were resuspended in 100 m1 of KHM. RNA was translated 
using a rabbit reticulocyte lysate (FlexiLysate, Promega. Southampton. U.K.) for I hour at 
30 °C. The translation reaction (25 ^1) contained 16 m1 reticulocvie lysate, 1 ul 1 mM amino 
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acids (minus methionine), 15 ^Ci L-[^^S] methionine, 1 |al transcribed RNA and semi- 
permeabilised cells (approx. 1 x 10^). After translation, N-ethylmaleimide was added to a 
final concentration of 20 mM. The formation of disulphide bonds was verified by 
comparative gel electrophoresis on 7.5% polyacr>1amide gel of translation products run 
under reducing and non-reducing conditions. 

When analysed using this cell-free system the translation product from 
proa2(I)Al mRNA did not self-associate to form homotrimers indicating that it does not 
contain the information necessary for the initial recognition event (Figure 4, lanes 3 and 7). 

A cDNA clone coding for a truncated proa 1 (III) chain (designated 
proal(III)Al; SEQ ID NO: 2; nucleic acid coding sequence - SEQ ID NO: 20) was 
constructed from a flill-length type III procollagen cDNA w^hich was constructed from rvvo 
partial cDNAs, pS4 1 3 and pS3 1 ( Ala-Kokko, et aL, 1989, Biochem. J., 260: 509-5 1 6). Each 
cDNA was subcloned into the EcoRI site of pBluescript SK*. A 4,7 kb Sal I (restriction 
enzyme) fragment containing the vector and the 5' terminal 1 .8 kb was isolated from pS41 3 
and ligated to a 3.6 kb Sal I fragment derived from pS3 1 to produce proa 1 (III). An internal 
2.5 kb Xhol fragment was excised from proal(III) and the parental plasmid re-ligated to 
create proaiaiI)Al (Lees and Bulleid, 1994, J. Biol. Chem.,262: 24354-24360). 

The translation product from proa 1 (III) A 1 mRNA was able to assemble to 
form a homotrimer as judged by its ability to form inter-chain disulphide bonded dimers and 
trimers when translated in a semi-permeabilised cell-free translation system. This 
demonstrated that it contained the information required for self-assembly (Figure 4, lanes 
2 and 6). 

Hybrid cDNA clones were prepared which contained sequences derived from 
proa 1 (III) A 1 and proa 2(1) A I . The C-proteinase cleavage site of proa Kill) A 1 was, for these 
experiments, taken to be between Ala 377 and Pro 378 of SEQ ID NO: 2, instead of 
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between Gly 381 and Asp 382 of SEQ ID NO; 2 as shown in Figure 2. In the first of these 
constructs the coding sequence for the C-propeptide from the proa 1 (III) A 1 chain was 
replaced by that for die C-propeptide from the proa2(I) chain, the resulting chimera being 
designated proa 1(III):(I)CP (SEQ ID NO: 3). This construct failed to self-associate when 
translated in the cell-free system (Figure 4, lanes 5 and 9). A reciprocal construct was made 
where the C-propeptide from the proa2(I)A 1 was replaced with the C-propeptide from the 
proa 1 (III) chain with the resulting chimera designated proa2(I):(III)CP (SEQ ID NO: 4). 

This construct was able to self associate to form dimers and homotrimers 
(Figure 4, lanes 4 and 8), demonstrating directly for the first time that all the information 
required for selective association resides within the C-propeptide. The construct 
proa 1(III):(I)CP was prepared as described below. Other constructs were produced using 
the same approach and published sequences. 

The hybrid cDNA clones were prepared using a PCR-based approach. For the 
construction of proal(III):(I)CP, a PCR product was prepared from proal(III)Al with 
primers, one of which (SEQ ID NO: 15; JL-35) hybridised within the triple helical domain 
whilst the other (SEQ ID NO: 16; JL-32) hybridised with 21 nucleotides upstream from the 
junction point at the C-propeptide. This primer also contained an overlap of 2 1 nucleotides 
which were complimentary to the first 21 nucleotides of the C-propeptide of proa2(I)A 1. 
This gave a 0.25 Kb PCR product. 

A second PCR product was prepared from proa2(I)A 1 with primers, one of 
which (SEQ ID NO: 17; JL-31Kpn) hybridised downstream from the stop codon for 
translation within the 3 '-non-translated region. This primer also contained a Kpnl site. The 
other primer (SEQ ID NO: 18: JL-36) hybridised with the first 18 nucleotides of the C- 
propeptide of proa2(I)A 1. This gave a 0.85 Kb PCR product. 
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The two PCR products were combined and a third PCR (an overlap extension) 
was carried out with primers JL-35 (SEQ ID NO: 15) and JL-31Kpn fSEQ ID NO; 17) to 
yield a 1 . 1 Kb product. This was cut with Xhol and Kpnl and subcloned into Xhol and Kpnl 
cut proa 1 (III) A 1 to yield proa 1 (III):(I)CP. 

A variety of hybrid constructs were then prepared in which parts of the 
proa2(I)A 1 C-propeptide sequence was replaced with the corresponding region from the 
proal(III) C-propeptide. The various regions are outlined in Figure 2 with the junction 
points designated as A, F, B, C, and G. So for example the A-join molecule contains all of 
the proa2(I)Al sequence up to but not including the A site (i.e. . . . DY) with all of the 
sequence carboxy-to this site (i.e. EI ... ) being derived from the corresponding region 
from the C-propeptide of the proa 1 (III) chain. Proa 1(111) and proa2(I) C-propeptides differ 
in their complement of cysteine residues (and hence in their ability to form disulphide 
bonds), with proa2(I) lacking the Cys 2 residue (Figure 2), instead having a serine residue. 
In order to ease analysis under non-reducing conditions (see below) the F, B and C 
constructs contained a serine to cysteine mutation at the Cys 2 site of the proa2(I) chain. 
To ensure that this mutation played no role in chain selection, a similarly mutated construct 
(proa2(I):(III) BGR^"^ - also referred to as BGR^'^; see below) was back-mutated. The back- 
mutated construct ( i.e. proa2(I):(III) BGR - also referred to as BGR) had the same chain 
selectivit>^ as its parent molecule (proa2(I):(III) BGR^'^) (see below). 

The A-join, F-join and B-join chimeras all assembled to form homotrimers 
w^hen translated in the cell-free system (Figure 5, lanes 7, 8 and 9). However, the C-join 
molecule did not assemble (Figure 5, lane 10) suggesting that the recognition site for 
assembly was contained within the sequence carboxy-terminal to the B-site and amino- 
terminal to the C-site. The possibilitv' that the lack of assembly of the C-join molecule was 
due to this site being within the recognition sequence for assembly could not be ruled out. 
Evidence that the recognition site had been disrupted was obtained when the reciprocal 
construct was made. Tnis construct contained the proa 1 (III) A 1 chain up to the C-site with 
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the rest of the C-propeptide being derived from the proa2(I) C-propeptide. No assembly 
occurred from this construct (recip-C-join) illustrating that the recognition site had been 
disrupted (Figure 5, lane 11), 

The next construct made contained all of the proa2(I)A 1 sequence apart from 
a short stretch of 23 amino acids between the B-site and the G-site (SEQ ID NO: 6) which 
were replaced with the corresponding region from the C-propeptide of proal(III). This 
construct (designated BGR; SEQ ID NO: 5) was altered by site directed mutagenesis of 
cysteine for serine at the Cys 2 site of the proa2(I) part of the molecule (i.e. cysteine was 
substituted for the serine 335 residue of SEQ ID NO: 5). The resultant molecule (designated 
proa2(I):(III) BGR^"^ was shown to assemble to form inter-chain disulphide bonded 
homotrimers when translated in the cell-free system (Figure 7, lane 4), demonstrating that 
this short stretch of 23 amino acids contains all the information to drive homotrimer 
formation. 

To verify that the serine to cysteine mutation did not affect chain selection, 
a back mutation was made (i.e. to give proa2(I):(III) BGR) and homotrimer formation 
analysed. As expected, no inter-chain disulphide bonded trimers were detected (Figure 7, 
lane 5) as this molecule does not contain the cysteine residue required for inter-chain 
disulphide bond formation. 

To determine whether a stable triple helix was formed after translation of the 
chimeric procollagens, a simple protease protection assay was carried out. This involved 
treating the translation products with a combination of proteolytic enzymes (trv'psin, 
chymotrypsin and pepsin). Isolated SP-cells following translation were resuspended in 0.5 
M acetic acid in 1% (v/v) Triton X-100 and incubated with pepsin (100 mg^ml) for 2 hours 
at 20 °C. Digestions were stopped by neutralisation with 1 M Tris base and proteins 
precipitated with ethanol at a final concentration of 27% (v/v). Precipitated protein from 
pepsin digests were resuspended in 50 mM Tris-HCl, pH 7.4 containing 0.15 M NaCK 10 
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mM EDTA (ethylenediaminetetra-acetic acid) and 1% Triton X-100. Chymotr\^psin and 
trypsin were added to a final concentration of 250 jig^ml and 100 fig/ml respectively and 
samples incubated at room temperature for 2 minutes. The digestion was stopped by the 
addition of soyabean tripsin inhibitor to a final concentration of 500 jig/mi and 5 volumes 
of boiling SDS-PAGE (sodium dodecyl sulphate-polyacr>1amide gel electrophoresis) 
sample buffer and boiling the samples for 3 minutes. The results are shown in Figures 6 and 
8. The formation of a stable triple helix is characterised by the appearance of a protease 
resistant fragment (corresponding to the triple helical domain) after digestion. Only the 
products of translation of the proa2(I):(III)CP (Figure 6, lane 2) and the BGR constructs 
(Figure 6, lane 4; Figure 8, lanes 2 and 3) generated a protease resistant fragment which 
were only formed w^hen a,a'-dipyridyl (an inhibitor of prolyl 4-hydroxylase) was not 
present during the translation (Figure 6). As proline hydroxylation is necessar>^ for 
formation of a thermally stable triple helix, this demonstrates that a correctly folded triple 
helix was formed with these constructs. 

This also demonstrates that although BGR was not able to form trimers 
stabilised by inter-chain disulphide bonds, it was able to trimerise to form a correctly 
aligned triple helix. 

Analysis of the B-G motif from the proa 1 (III) and proa2(I) chains (Figure 3) 
shows that of the residues in the recognition sequences (Figure 3; SEQ ID NOs: 6 and 8 
respectively), residues 13-20 are identical with the exception of residue 17 - Leu (L) in 
proa Kill) and Met (M) in proa2(I). In order to determine the role played by these residues 
in the chain selection process, site directed mutagenesis was used to substitute Met for Leu 
in the proa 1 (III) recognition sequence in the proa2(I):(III) BGR^^^ construct (designated 
proa2(I):(III) BGR*- '^ - also referred to as BGR*- '''), i.e. residue Leu 425 of SEQ ID NO: 
5 was substituted for Met, and residue Ser 335 was substituted for Cys. 
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Chain assembly of proa2(I):(III) BGR^*'^ was performed as described above 
and the electrophoretic mobihty of the chains analysed. Under non-reducing conditions this 
construct formed inter-chain disulphide bonds (Figure 7, lane 6), and formed protease- 
resistant triple helical domains (Figure 8, lane 4). The substitution of Leu for Met did not, 
therefore, disrupt the process of chain selection nor did it prevent the formation of a 
correctly aligned helix. 
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Tat) |g 1 



Type 


Chains 


Molecules 


Distribution 


I 


a 1(1) 


major [a I(I)]2a2(I) 


widespread, skin, bone, tendon. 




a2(I) 


minor [al(I)]3 


ligament, cornea. 


II 


a 1(11) 


homotrimers [al(II)]3 


cartilage, notochord, 
invertebrate disc, ear, 
developmg bone, eye, cornea 


III 


ol(III) 


homotrimers [a I (111)3] 


widespread, particularly found 
with t>'pe I collagen 


V 


al(V) 
o2(V) 
a3(V) 


heterotrimers 


widespread, particularly found 
in cornea with type I collagen 


XI 


ol(XI) 
a2(XI) 
o3(XI) 

a3(XI)=al(II) 


heterotrimers 


cartilage, cornea and vitreous 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: The Victoria University of Manchester 

(B) STREET: Oxford Road 

(C) CITY: Manchester 

(E) COUNTRY: GB 

(F) POSTAL CODE (ZIP) : M13 9PL 

(ii) TITLE OF INVENTION: Novel Procollagens 
(iii) NUMBER OF SEQUENCES: 20 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version ^1.30 (EPO) 

(VI ) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: GB 9517773.9 

(B) FILING DATS: 31-AUG-1995 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: GB 9606152.8 

(B) FILING DATE: 23-MAR-1996 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: GB 9612476.3 

(B) FILING DATE: 14-JUN-1996 

(2) INFORMATION FOR SEQ ID NO : 1: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

Met Leu Ser Phe Val Asp Thr Arg Thr Leu Leu Leu Leu Ala Val Thr 
15 10 15 

Leu Cys Leu Ala Thr Cys Gin Ser Leu Gin Glu Glu Thr Val Arg Lys 
20 25 30 

Gly Pro Ala Gly Asp Arg Gly Pro Arg Gly Glu Arg Gly Pro Pro Gly 
35 40 45 

Pro Pro Gly Arg Asp Gly Glu Asp Gly Pro Thr Gly Pro Pro Gly Pro 
50 55 60 

Pro Gly Pro Pro Gly Pro Pro Gly Leu Gly Gly Asn Phe Ala Ala Gin 

65 70 75 80 

Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Met Gly Leu Met 
85 90 95 

Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin Gly 
100 105 110 

Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly Pro 
115 120 125 



Ala Gly Ala Pro Gly Pro His Gly Pro Val Gly Pro Ala Gly Lys His 
130 135 140 
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Gly Asn Arg Gly Glu Thr Gly Pro Ser Gly Pro Val Gly Pro Ala Gly 
145 150 155 ISO 

Ala Val Gly Pro Arg Gly Pro Ser Gly Pro Gin Gly He Arg Gly Asp 
165 170 175 

Lys Gly Glu Pro Gly Glu Lys Gly Pro Arg Gly Leu Pro Gly Phe Lys 
180 185 190 

Gly His Asn Gly Leu Gin Gly Leu Pro Gly He Ala Gly His His Gly 
195 200 205 

Asp Gin Gly Ala Pro Gly Ser Val Gly Pro Ala Gly Pro Arg Gly Pro 
210 215 220 

Ala Gly Pro Ser Gly Pro Ala Gly Lys Asp Gly Arg Thr Gly His Pro 
225 230 235 240 

Gly Thr Val Gly Pro Ala Gly He Arg Gly Pro Gin Gly His Gin Gly 
245 250 255 

Pro Ala Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val 
260 265 270 

Ser Gly Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala 
275 280 285 

Asp Gin Pro Arg Ser Ala Pro Ser Leu Arg Pro Lys Asp Tyr Glu Val 
290 295 300 

Asp Ala Thr Leu Lys Ser Leu Asn Asn Gin He Glu Thr Leu Leu Thr 
305 310 315 320 



Pro Glu Gly Ser Arg Lys Asn Pro Ala Arg Thr Cys Arg Asp Leu Arg 
325 330 335 
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Leu Ser His Pro Glu 
340 

Gin Gly Cys Thr Met 
355 

Gly Glu Thr Cys lie 
370 

Trp Tyr Arg Ser Ser 
385 

lie Asn Ala Gly Ser 
405 

Lys Glu Met Ala Thr 
420 

Ala Ser Gin Asn He 
435 

Asp Glu Glu Thr Gly 
450 

Asn Asp Val Glu Leu 
465 

Val Leu Val Asp Gly 
485 

He He Glu Tyr Lys 
500 
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Trp Ser Ser Gly Tyn: T^/r 
345 

Glu Ala He Lys Val Tv^r 
360 

Arg Ala Gin Pro Glu Asn 
375 

Lys Asp Lys Lys His Va 1 
390 395 

Gin Phe Glu Tyr Asn Val 
410 

Gin Leu Ala Phe Met Arg 
425 

Thr Tyr His Cys Lys Asn 
440 

Asn Leu Lys Lys Ala Val 
455 

Val Ala Glu Gly Asn Ser 
470 475 

Cys Ser Lys Lys Thr Asn 
490 

Thr Asn Lys Pro Ser Arg 
505 
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Trp He Asp Pro Asn 
350 

Cys Asp Phe Pro Thr 
365 

He Pro Ala Lys Asn 
380 

Trp Leu Gly Glu Thr 
400 

Glu Gly Val Thr Ser 
415 

Leu Leu Ala Asn Tyr 
430 

Ser He Ala Tyr Met 
445 

He Leu Gin Gly Ser 
460 

Arg Phe Thr T^/r Thr 
480 

Glu Trp Gly Lys Thr 
495 

Leu Fro Phe Leu Asp 
510 



He Ala Pro Leu Asp He Gly Gly Ala Asp His Glu Phe Phe Val Asp 
515 520 525 
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lie Gly Pro Val Cys Phe Lys 
530 535 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 626 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Met Ser Phe Val Gin Lys Gly Ser Trp Leu Leu Leu Ala Leu Leu 
15 10 15 

His Pro Thr He He Leu Ala Gin Gin Glu Ala Val Glu Gly Gly Cys 
20 25 30 

Ser His Leu Gly Gin Ser Tyr Ala Asp Arg Asp Val Trp Lys Pro Glu 
35 40 45 

Pro Cys Gin He Cys Val Cys Asp Ser Gly Ser Val Leu Cys Asp Asp 
50 55 60 

He He Cys Asp Asp Gin Glu Leu Asp Cys Pro Asn Pro Glu He Pro 
65 70 75 80 

Phe Gly Glu Cys Cys Ala Val Cy^s Pro Gin Pro Pro Thr Ala Pro Thr 
85 90 95 

Arg Pro Pro Asn Gly Gin Gly Pro Gin Gly Pro Lys Gly Asp Pro Gly 
100 105 110 

Pro Pro Gly He Pro Gly Arg Asn Gly Asp Pro Gly He Pro Gly Gin 
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115 

Pro Gly Ser Pro Gly 
130 

Pro Thr Gly Pro Gin 
145 

Lys Ser Gly Val Ala 
165 

Gly Pro Pro Gly Pro 
180 

Ser Pro Gly Ser Pro 
195 

Ala Gly Pro Ser Gly 
210 

Gly Pro Ala Gly Lys 
225 

Glu Arg Gly Leu Pro 
245 

Pro Gly Phe Pro Gly 
260 

Gly Glu Lys Gly Glu 
275 

Leu Pro Gly Glu Asn 
290 
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120 

Ser Pro Gly Pro Pro Gly 
135 

Asn Tyr Ser Pro Gin Tyr 
150 155 

Val Gly Gly Leu Ala Gly 
170 

Pro Gly Pro Pro Gly Thr 
185 

Gly Tyr Gin Gly Pro Pro 
200 

Pro Pro Gly Pro Pro Gly 
215 

Asp Gly Glu Ser Gly Arg 
230 235 

Gly Pro Pro Gly lie Lys 
250 

Met Lys Gly His Arg Gly 
265 

Thr Gly Ala Pro Gly Leu 
280 

Gly Ala Pro Gly Pro Met 
295 
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125 

lie Cys Glu Ser C^^s 
140 

Asp Ser Tyr Asp Val 
160 

Tyr Pro Gly Pro Ala 
175 

Ser Gly His Pro Gly 
190 

Gly Glu Pro Gly Gin 
205 

Ala lie Gly Pro Ser 
220 

Pro Gly Arg Pro Gly 
240 

Gly Pro Ala Gly lie 
255 

Phe Asp Gly Arg Asn 
270 

Lys Gly Glu Asn Gly 
285 

Gly Pro Arg Gly Ala 
300 



Pro Gly Glu Arg Gly Arg Pro Gly Leu Pro Gly Ala Ala Gly Ala Arg 
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305 310 315 320 

Gly Asn Asp Gly Ala Arg Gly Asn Arg Gly Glu Arg Gly Ser Glu Gly 
325 330 335 

Ser Pro Gly His Pro Gly Gin Pro Gly Pro Pro Gly Pro Pro Gly Ala 
340 345 350 

Pro Gly Pro Cys Cys Gly Gly Val Gly Ala Ala Ala He Ala Gly He 
355 360 365 

Gly Gly Glu Lys Ala Gly Gly Phe Ala Pro Tyr Tyr Gly Asp- Glu Pro 
370 375 380 

Met Asp Phe Lys He Asn Thr Asp Glu He Met Thr Ser Leu Lys Ser 
385 390 395 400 

Val Asn Gly Gin He Glu Ser Leu He Ser Pro Asp Gly Ser Arg Lys 
405 410 415 

Asn Pro Ala Arg Asn Cys Arg Asp Leu Lys Phe Cys His Pro Glu Leu 
420 425 430 

Lys Ser Gly Glu Tyr Trp Val Asp Pro Asn Gin Gly Cys Lys Leu Asp 
435 440 445 

Ala He Lys Val Phe Cys Asn Met Glu Thr Gly Glu Thr Cys He Ser 
450 455 460 

Ala Asn Pro Leu Asn Val Pro Arg Lys His Trp Trp Thr Asp Ser Ser 
465 470 475 480 

Ala Glu Lys Lys His Val Trp Phe Gly Glu Ser Met Asp Gly Gly Phe 
485 490 495 



Gin Phe Ser Tyr Gly Asn Pro Glu Leu Pro Glu Asp Val Leu Asp Val 
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500 505 510 

Gin Leu Ala Phe Leu Arg Leu Leu Ser Ser Arg Ala Ser Gin Asn He 
515 520 525 

Thr Tyr His Cys Lys Asn Ser He Ala Tyr Met Asp Gin Ala Ser Gly 
530 535 540 

Asn Val Lys Lys Ala Leu Lys Leu Met Gly Ser Asn Glu Gly Glu Phe 
545 550 555 560 

Lys Ala Glu Gly Asn Ser Lys Phe Thr T^/r Thr Val Leu Glu Asp Gly 
565 570 575 

Cys Thr Lys His Thr Gly Glu Trp Ser Lys Thr Val Phe Glu Tyr Arg 
580 585 590 

Thr Arg Lys Ala Val Arg Leu Pro He Val Asp He Ala Pro Tyr Asp 
595 600 605 

He Gly Gly Pro Asp Gin Glu Phe Gly Val Asp Val Gly Pro Val Cys 
610 615 620 

Phe Leu 
625 

(2) INFORMATION FOR SEQ ID NO : 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 623 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
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Met Met Ser Phe Val 
1 5 

His Pro Thr He He 
20 

Ser His Leu Gly Gin 
35 

Pro Cys Gin He Cys 
50 

He He Cys Asp Asp 
€5 

Phe Gly Glu Cys Cys 
85 

Arg Pro Pro Asn Gly 
100 

Pro Pro Gly He Pro 
115 

Pro Gly Ser Pro Gly 
130 

Pro Thr Gly Pro Gin 
145 

Lys Ser Gly Val Ala 
165 
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Gln Lys Gly Ser Trp Leu 
10 

Leu Ala Gin Gin Glu Ala 
25 

Ser T^/r Ala Asp Arg Asp 
40 

Val Cys Asp Ser Gly Ser 
55 

Gin Glu Leu Asp Cys Pro 
70 75 

Ala Val Cys Pro Gin Pro 
90 

Gin Gly Pro Gin Gly Pro 
105 

Gly Arg Asn Gly Asp Pro 
120 

Ser Pro Gly Pro Pro Gly 
135 

Asn Tyr Ser Pro Gin Tyr 
150 155 

Val Gly Gly Leu Ala Gly 
170 
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Leu Leu Ala Leu Leu 
15 

Val Glu Gly Gly Cys 
30 

Val Trp Lys Pro Glu 
45 

Val Leu Cys Asp Asp 
60 

Asn Pro Glu He Pro 
80 

Pro Thr Ala Pro Thr 
95 

Lys Gly Asp Pro Gly 
110 

Gly He Pro Gly Gin 
125 

He Cys Glu Ser Cys 
140 

Asp Ser T>'r Asp Val 
160 

Tyr Pro Gly Pro Ala 
175 



Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Thr Ser Gly His Pro Gly 
180 185 190 
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Ser Pro Gly Ser Pro Gly T^'r Gin Gly Pro Pro Gly Glu Pro Gly Gin 
195 200 205 

Ala Gly Pro Ser Gly Pro Pro Gly Pro Pro Gly Ala lie Gly Pro Ser 
210 215 220 

Gly Pro Ala Gly Lys Asp Gly Glu Ser Gly Arg Pro Gly Arg Pro Gly 
225 230 235 240 

Glu Arg Gly Leu Pro Gly Pro Pro Gly lie Lys Gly Pro Ala Gly lie 
245 250 255 

Pro Gly Phe Pro Gly Met Lys Gly His Arg Gly Phe Asp Gly Arg Asn 
260 265 270 

Gly Glu Lys Gly Glu Thr Gly Ala Pro Gly Leu Lys Gly Glu Asn Gly 
275 280 285 

Leu Pro Gly Glu Asn Gly Ala Pro Gly Pro Met Gly Pro Arg Gly Ala 
290 295 300 

Pro Gly Glu Arg Gly Arg Pro Gly Leu Pro Gly Ala Ala Gly Ala Arg 
305 310 315 320 

Gly Asn Asp Gly Ala Arg Gly Asn Arg Gly Glu Arg Gly Ser Glu Gly 
325 330 335 

Ser Pro Gly His Pro Gly Gin Pro Gly Pro Pro Gly Pro Pro Gly Ala 
340 345 350 

Pro Gly Pro Cys Cys Gly Gly Val Gly Ala Ala Ala lie Ala Gly lie 
355 360 365 



Gly Gly Glu Lys Ala Gly Gly Phe Ala Asp Gin Arg Ser Ala Pro Ser 
370 375 380 
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Leu Arg Pro Lys Asp Tyr Glu Val Asp Ala Thr Leu Lys Ser Leu Asn 
385 390 395 400 

Asn Gin lie Glu Thr Leu Leu Thr Pro Glu Gly Ser Arg Lys Asn Pro 
405 410 415 

Ala Arg Thr Cys Arg Asp Leu Arg Leu Ser His Pro Glu Trp Ser Ser 
420 425 430 

Gly Tyr Tyr Trp He Asp Pro Asn Gin Gly Cys Thr Met Glu Ala He 
435 440 445 

Lys Val Tyr Cys Asp Phe Pro Thr Gly Glu Thr Cys He Arg Ala Gin 
450 455 460 

Pro Glu Asn He Pro Ala Lys Asn Trp Tyr Arg Ser Ser Lys Asp Lys 
465 470 475 480 

Lys His Val Trp Leu Gly Glu Thr He Asn Ala Gly Ser Gin Phe Glu 
485 490 495 

Tyr Asn Val Glu Gly Val Thr Ser Lys Glu Met Ala Thr Gin Leu Ala 
500 505 510 

Phe Met Arg Leu Leu Ala Asn Tyr Ala Ser Gin Asn He Thr T>^r His 
515 520 525 

Cys Lys Asn Ser He Ala Tyr Met Asp Glu Glu Thr Gly Asn Leu Lys 
530 535 540 

Lys Ala Val He Leu Gin Gly Ser Asn Asp Val Glu Leu Val Ala Glu 
545 550 555 560 



Gly Asn Ser Arg Phe Thr Tyr Thr Val Leu Val Asp Gly Cys Ser Lys 
565 570 575 
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Lys Thr Asn Glu Trp Gly Lys Thr 
580 

Pro Ser Arg Leu Pro Phe Leu Asp 
595 600 

Ala Asp His Glu Phe Phe Val Asp 
610 615 

(2) INFORMATION FOR SEQ ID NO: 4: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 537 amino acids 

(B) TYPE: amino acid 
CO STRANDEDNESS : 

(D) TOPOLOGY: unknown 



lie lie Glu Tyr Lys Thr Asn Lys 
585 590 

lie Ala Pro Leu Asp lie Gly Gly 
605 

lie Gly Pro Val Cys Phe Lys 
620 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4: 

Met Leu Ser Phe Val Asp Thr Arg Thr Leu Leu Leu Leu Ala Val Thr 
15 10 15 

Leu Cys Leu Ala Thr Cys Gin Ser Leu Gin Glu Glu Thr Val Arg Lys 
20 25 30 

Gly Pro Ala Gly Asp Arg Gly Pro Arg Gly Glu Arg Gly Pro Pro Gly 
35 40 45 

Pro Pro Gly Arg Asp Gly Glu Asp Gly Pro Thr Gly Pro Pro Gly Pro 
50 55 60 

Pro Gly Pro Pro Gly Pro Pro Gly Leu Gly Gly Asn Phe Ala Ala Gin 
65 70 75 80 

T-yr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Met Gly Leu Met 
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85 90 



95 



Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin Gly 
100 105 110 

Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly Pro 
lis 120 125 

Ala Gly Ala Pro Gly Pro His Gly Pro Val Gly Pro Ala Gly Lys His 
130 135 140 

Gly Asn Arg Gly Glu Thr Gly Pro Ser Gly Pro Val Gly Pro Ala Gly 

150 155 ISO 

Ala Val Gly Pro Arg Gly Pro Ser Gly Pro Gin Gly He Arg Gly Asp 
165 170 175 

Lys Gly Glu Pro Gly Glu Lys Gly Pro Arg Gly Leu Pro Gly Phe Lys 
180 185 190 

Gly His Asn Gly Leu Gin Gly Leu Pro Gly He Ala Gly His His Gly 
195 200 205 

Asp Gin Gly Ala Pro Gly Ser Val Gly Pro Ala Gly Pro Arg Gly Pro 
210 215 220 

Ala Gly Pro Ser Gly Pro Ala Gly Lys Asp Gly Arg Thr Gly His Pro 

230 235 240 

Gly Thr Val Gly Pro Ala Gly He Arg Gly Pro Gin Gly His Gin Gly 
245 250 255 

Pro Ala Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val 
260 255 270 



Ser Gly Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe T>'r Arg Ala 
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275 280 285 

Pro Tyr Tyr Gly Asp Glu Pro Met Asp Phe Lys lie Asn Thr Asp Glu 
290 295 300 

He Met Thr Ser Leu Lys Ser Val Asn Gly Gin He Glu Ser Leu He 
305 310 315 320 

Ser Pro Asp Gly Ser Arg Lys Asn Pro Ala Arg Asn Cys Arg Asp Leu 
325 330 335 

Lys Phe Cys His Pro Glu Leu Lys Ser Gly Glu T\'r Trp Val- Asp Pro 
340 345 350 

Asn Gin Gly Cys Lys Leu Asp Ala He Lys Val Phe Cys Asn Met Glu 
355 360 365 

Thr Gly Glu Thr Cys He Ser Ala Asn Pro Leu Asn Val Pro Arg Lys 
370 375 380 

His Trp Trp Thr Asp Ser Ser Ala Glu Lys Lys His Val Trp Phe Gly 
385 390 395 400 

Glu Ser Met Asp Gly Gly Phe Gin Phe Ser T\^r Gly Asn Pro Glu Leu 
405 410 415 

Pro Glu Asp Val Leu Asp Val Gin Leu Ala Phe Leu Arg Leu Leu Ser 
420 425 430 

Ser Arg Ala Ser Gin Asn He Thr Tyr His Cys Lys Asn Ser He Ala 
435 440 445 

Tyr Met Asp Gin Ala Ser Gly Asn Val Lys Lys Ala Leu Lys Leu Met 
450 455 460 



Gly Ser Asn Glu Gly Glu Phe Lys Ala Glu Gly Asn Ser Lys Phe Thr 
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465 470 475 480 

Tyr Thr Val Leu Glu Asp Gly Cys Thr Lys His Thr Gly Glu Trp Ser 
485 490 495 

Lys Thr Val Phe Glu Tyr Arg Thr Arg Lys Ala Val Arg Leu Pro lie 
500 505 510 

Val Asp He Ala Pro Tyr Asp He Gly Gly Pro Asp Gin Glu Phe Gly 
515 520 525 

Val Asp Val Gly Pro Val Cys Phe Leu 
530 535 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 4 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5: 

Met Leu Ser Phe Val Asp Thr Arg Thr Leu Leu Leu Leu Ala Val Thr 
15 10 15 

Leu Cys Leu Ala Thr Cys Gin Ser Leu Gin Glu Glu Thr Val Arg Lys 
20 25 30 

Gly Pro Ala Gly Asp Arg Gly Pro Arg Gly Glu Arg Gly Pro Pro Gly 
35 40 45 



Pro Pro Gly Arg Asp Gly Glu Asp Gly Pro Thr Gly Pro Pro Gly Pro 
50 55 60 
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Pro Gly Pro Pro Gly Pro Pro Gly Leu Gly Gly Asn Phe Ala Ala Gin 
65 70 75 80 

Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Met Gly Leu Met 
85 90 95 

Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin Gly 
100 105 110 

Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly Pro 
115 120 125 

Gly Ala Pro Gly Pro His Gly Pro Val Gly Pro Ala Gly Lys Kis Gly 
130 135 140 

Asn Arg Gly Glu Thr Gly Pro Ser Gly Pro Val Gly Pro Ala Gly Ala 
145 150 155 160 

Val Gly Pro Arg Gly Pro Ser Gly Pro Gin Gly lie Arg Gly Asp Lys 
165 170 175 

Gly Glu Pro Gly Glu Lys Gly Pro Arg Gly Leu Pro Gly Phe Lys Gly 
180 185 190 

His Asn Gly Leu Gin Gly Leu Pro Gly lie Ala Gly His His Gly Asp 
195 200 205 

Gin Gly Ala Pro Gly Ser Val Gly Pro Ala Gly Pro Arg Gly Pro Ala 
210 215 220 

Gly Pro Ser Gly Pro Ala Gly Lys Asp Gly Arg Thr Gly His Pro Gly 
225 230 235 240 



Thr Val Gly Pro Ala Gly lie Arg Gly Pro Gin Gly His Gin Gly Pro 
245 250 255 
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Ala Gly Pro Pro 
260 

Gly Gly Gly Tyr 
275 

Gin Pro Arg Ser 
290 

Ala Thr Leu Lys 
305 

Glu Gly Ser Arg 

Ser His Pro Glu 
340 

Gly Cys Thr Met 
355 

Glu Thr Cys He 
370 

Tyr Arg Ser Ser 
385 

Asn Ala Gly Ser 

Val Leu Asp Val 
420 



Gly Pro Pro Gly 

Asp Phe Gly Tyr 
280 

Ala Pro Ser Leu 
295 

Ser Leu Asn Asn 
310 

Lys Asn Pro Ala 
325 

Trp Ser Ser Gly 

Glu Ala He Lys 
360 

Arg Ala Gin Pro 
375 

Lys Asp Lys Lys 
390 

Gin Phe Glu Tyr 
405 

Gin Leu Ala Phe 



Pro Leu Gly Pro 
265 

Asp Gly Asp Phe 

Arg Pro Lys Asp 
300 

Gin He Glu Thr 
315 

Arg Thr Cys Arg 
330 

Tyr Tyr Trp He 
345 

Val Tyr Cys Asp 

Glu Asn He Pro 
380 

His Val Trp Leu 
395 

Gly Asn Pro Glu 
410 

Leu Arg Leu Leu 
425 



Leu Gly Val Ser 
270 

Tyr Arg Ala Asp 
285 

Tyr Glu Val Asp 

Leu Leu Thr Pro 
320 

Asp Lsu Arg Leu 
335 

Asp Pro Asn Gin 
350 

Phe Pro Thr Gly 
365 

Ala Lys Asn Trp 

Gly Glu Thr He 
400 

Leu Pro Glu Asp 
415 

Ser Ser Arg Ala 
430 



Ser Gin Asn He Thr Tyr His Cys Lys Asn Ser He Ala Tyr Met Asp 
435 440 445 



wo 97/08311 



-41 - 



PCT/GB96/02122 



Glu Glu Thr Gly Asn Leu Lys Lys Ala Val He Leu Gin Gly Ser Asn 
450 455 460 

Asp Val Glu Leu Val Ala Glu Gly Asn Ser Arg Phe Thr Tyr Thr Val 
465 470 475 480 

Leu Val Asp Gly Cys Ser Lys Lys Thr Asn Glu Trp Gly Lys Thr He 
485 490 495 

He Glu Tyr Lys Thr Asn Lys Pro Ser Arg Leu Pro Phe Leu Asp He 
500 505 510 

Ala Pro Leu Asp He Gly Gly Ala Asp His Glu Phe Phe Val Asp He 
515 520 525 

Gly Pro Val Cys Phe Lys 
530 

C2) INFORMATION FOR SEQ ID NO : 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 23 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : 

( D ) TOPOLOGY : unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Gly Asn Pro Glu Leu Pro Glu Asp Val Leu Asp Val Gin Leu Ala Phe 
15 10 15 

Leu Arg Leu Leu Ser Ser Arg 
20 



(2) INFORMATION FOR SEQ ID NO: 7 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Gly Gly Gin Gly Ser Asp Pro Ala Asp Val Ala lie Gin Leu Thr Phe 
15 10 15 

Leu Arg Leu Met Ser Thr Glu 
20 

(2) INFORMATION FOR SEQ ID NO : 8: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2 3 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Asn Val Glu Gly Val Thr Ser Lys Glu Met Ala Thr Gin Leu Ala Phe 
15 10 15 

Met Arg Leu Leu Ala Asn Tyr 
20 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 



(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Gly Asp Asp Asn Leu Ala Pro Asn Thr Ala Asn Val Gin Met Thr Phe 
15 10 15 

Leu Arg Leu Leu Ser Thr Glu 
20 

(2) INFORMATION FOR SEQ ID NO : 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Val Asp Ala Glu Gly Asn Pro Val Gly Val Val Gin Met Thr Phe Leu 
15 10 15 

Arg Leu Leu Ser Ala Ser 

20 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACnERISTICS : 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 
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(D) TOPOLOGY: unknown 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Gly Asp His Gin Ser Pro Asn Thr Ala lie Thr Gin Met Thr Phe Leu 
15 10 15 

Arg Leu Leu Ser Lys Glu 
20 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Leu Asp Val Glu Gly Asn Ser He Asn Met Val Gin Met Thr Phe Leu 
15 10 IS 

Lys Leu Leu Thr Ala Ser 
20 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Val Asp Ser Glu Gly Ser Pro Val Gly Val Val Gin Leu Thr Phe Leu 
15 10 15 

Arg Leu Leu Ser Val Ser 
20 

) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 250 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Tyr Tyr Arg Ala Asp Asp Ala Asn Val Val Arg Asp Arg Asp Leu Glu 
15 10 15 

Val Asp Thr Thr Leu Lys Ser Leu Ser Gin Gin lie Glu Asn lie Arg 
20 25 30 

Ser Pro Glu Gly Ser Arg Lys Asn Pro Ala Arg Thr Cys Arg Asp Leu 
35 40 45 

Lys Met Cys His Ser Asp Trp Lys Ser Gly Glu T\'r Trp lie Asp Pro 
50 55 60 

Asn Gin Gly Cys Asn Leu Asp Ala lie Lys Val Phe Cys Asn Met Glu 
65 70 75 80 



Thr Gly Glu Thr Cys Val Tyr Pro Thr Gin Pro Ser Val Ala Gin Lys 
85 90 95 
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Asn Trp Tyr lie Ser Lys Asn Pro Lys Asp Lys Arg His Val Trp Phe 
100 105 110 

Gly Glu Ser Met Thr Asp Gly Phe Gin Phe Glu Tyr Gly Gly Gin Gly 
115 120 125 

Ser Asp Pro Ala Asp Val Ala lie Gin Leu Thr Phe Leu Arg Leu Met 
130 135 140 

Ser Thr Glu Ala Ser Gin Asn lie Thr Tyr His Cys Lys Asn Ser Val 

150 155 160 

Ala Tyr Met Asp Gin Gin Thr Gly Asn Leu Lys Lys Ala Leu Leu Leu 
165 170 175 

Lys Gly Ser Asn Glu lie Glu He Arg Ala Glu Gly Asn Ser Arg Phe 
180 185 190 

Thr Tyr Ser Val Thr Val Asp Gly Cys Thr Ser His Thr Gly Ala Trp 
155 200 205 

Gly Lys Thr Val He Glu Tyr Lys Thr Thr Lys Thr Ser Arg Leu Pro 
210 215 220 

He He Asp Val Ala Pro Leu Asp Val Gly Ala Pro Asp Gin Glu Phe 
225 230 235 240 

Gly Phe Asp Val Gly Pro Val Cys Phe Leu 
245 250 

!) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer sequence" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
AATGGAGCTC CTGGACCCAT G 21 
(2) INFORMATION FOR SEQ ID NO: 16: 

CD SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer sequence" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
AGGTGCTGAG CGAGGCTGGT CGGCAAAACC GCCAGCTTTT TC 4 2 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer sequence" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 17: 
CTGCTAGGTA CCAAATGGAA GGATTCAGCT TT 
(2) INFORMATION FOR SEQ ID NO : 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PGR primer sequence" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 18: 
GACCAGCCTC GCTCAGCA 

C2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1608 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ATGCTCAGCT TTGTGGATAC GCGGACTTTG TTGCTGCTTG CAGTAACCTT ATGCCTAGCA 
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ACATGCCAAT CTTTACAAGA GGAAACTGTA 
CGTGGAGAAA GGGGTCCACC AGGCCCCCCA 
CCTCCTGGTC CACCTGGTCC rCCTGGCCCC 
TATGATGGAA AAGGAGTTGG ACTTGGCCCT 
CCACCTGGTG CAGCTGGAGC CCCAGGCCCT 
GGTGAACCTG GTCAAACTGG TCCTGCAGGT 
GCTGGCAAAC ATGGAAACCG TGGTGAAACT 
GCTGTTGGCC CAAGAGGTCC TAGTGGCCCA 
GGTGAAAAGG GGCCCAGAGG TCTTCCTGGC 
CCTGGTATCG CTGGTCACCA TGGTGATCAA 
CCTAGGGGCC CTGCTGGTCC TTCTGGCCCT 
GGTACGGTTG GACCTGCTGG CATTCGAGGC 
CCTGGTCCCC CTGGCCCTCC 7GGACCTCCA 
TACGATGGAG ACTTCTACAG GGCTGACCAG 
GACTATGAAG TTGATGCTAC TCTGAAGTCT 
CCTGAAGGCT CTAGAAAGAA ZCCAGCTCGC 
GAGTGGAGCA GCGGTTACTA CTGGATTGAC 
AAAGTATACT GTGATTTCCC TACCGGCGAA 
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AGAAAGGGCC CAGCCGGAGA TAGAGGACCA 12 0 

GGCAGAGATG GTGAAGATGG TCCCACAGGC 18 0 

CCTGGTCTCG GTGGGAACTT TGCTGCTCAG 24 0 

GGACCAATGG GCTTAATGGG ACCTAGAGGC 3 00 

CAAGGTTTCC AAGGACCTGC TGGTGAGCCT 3 60 

GCACCTGGTC CTCATGGCCC CGTGGGTCCT 42 0 

GGTCCTTCTG GTCCTGTTGG TCCTGCTGGT 4 80 

CAAGGCATTC GTGGCGATAA GGGAGAGCCC 54 0 

TTCAAGGGAC ACAATGGATT GCAAGGTCTG 6 00 

GGTGCTCCTG GCTCCGTGGG TCCTGCTGGT 66 0 

GCTGGAAAAG ATGGTCGCAC TGGACATCCT 72 0 

CCTCAGGGTC ACCAAGGCCC TGCTGGCCCC 78 0 

GGTGTAAGCG GTGGTGGTTA TGACTTTGGT 84 0 

CCTCGCTCAG CACCTTCTCT CAGACCCAAG 90 0 

CTCAACAACC AGATTGAGAC CCTTCTTACT 96 0 

ACATGCCGTG ACTTGAGACT CAGCCACCCA 102 0 

CCCAACCAAG GATGCACTAT GGAAGCCATC 10 8 0 

ACCTGTATCC GGGCCCAACC TGAAAACATC 114 0 
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CCAGCCAAGA ACTGGTATAG GAGCTCCAAG GACAAGAAAC ACGTCTGGCT AGGAGAAACT 12 0 0 

ATCAATGCTG GCAGCCAGTT TGAATATAAT GTTGAAGGAG TGACTTCCAA GGAAATGGCT 12 6 0 

ACCCAACTTG CCTTCATGCG CCTGCTGGCC AACTATGCCT CTCAGAACAT CACCTACCAC 132 0 

TGCAAGAACA GCATTGCATA CATGGATGAG GAGACTGGCA ACCTGAAAAA GGCTGTCATT 13 8 0 

CTACAGGGCT CTAATGATGT TGAACTTGTT GCTGAGGGCA ACAGCAGGTT CACTTACACT 14 4 0 

GTTCTTGTAG ATGGCTGCTC TAAAAAGACA AATGAATGGG GAAAGACAAT CATTGAATAC 1500 

AAAACAAATA AGCCATCACG CCTGCCCTTC CTTGATATTG CACCTTTGGA CATCGGTGGT 156 0 

GCTGACCATG AATTCTTTGT GGACATTGGC CCAGTCTGTT TCAAATAA 160 8 
(2) INFORMATION FOR SEQ ID NO : 20: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1881 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 20: 

ATGATGAGCT TTGTGCAAAA GGGGAGCTGG CTACTTCTCG CTCTGCTTCA TCCCACTATT 6 0 

ATTTTGGCAC AACAGGAAGC TGTTGAAGGA GGATGTTCCC ATCTTGGTCA GTCCTATGCG 12 0 

GATAGAGATG TCTGGAAGCC AGAACCATGC CAAATATGTG TCTGTGACTC AGGATCCGTT 180 

CTCTGCGATG ACATAATATG TGACGATCAA GAATTAGACT CZCCCAACCC AGAAATTCCA 24 0 
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TTTGGAGAAT GTTGTGCAGT TTGCCCACAG CCTCCAACTG CTCCTACTCG CCCTCCTAAT 3 00 

GGTCAAGGAC CTCAAGGCCC CAAGGGAGAT CCAGGCCCTC CTGGTATTCC TGGGAGAAAT 3 60 

GGTGACCCTG GTATTCCAGG ACAACCAGGG TCCCCTGGTT CTCCTGGCCC CCCTGGAATC 42 0 

TGTGAATCAT GCCCTACTGG TCCTCAGAAC TATTCTCCCC AGTATGATTC ATATGATGTC 4 80 

AAGTCTGGAG TAGCAGTAGG AGGACTCGCA GGCTATCCTG GACCAGCTGG CCCCCCAGGC 54 0 

CCTCCCGGTC CCCCTGGTAC ATCTGGTCAT CCTGGTTCCC CTGGATCTCC AGGATACCAA 60 0 

GGACCCCCTG GTGAACCTGG GCAAGCTGGT CCTTCAGGCC CTCCAGGACC TCCTGGTGCT 66 0 

ATAGGTCCAT CTGGTCCTGC TGGAAAAGAT GGAGAATCAG GTAGACCCGG ACGACCTGGA 72 0 

GAGCGAGGAT TGCCTGGACC TCCAGGTATC AAAGGTCCAG CTGGGATACC TGGATTCCCT 78 0 

GGTATGAAAG GACACAGAGG CTTCGATGGA CGAAATGGAG AAAAGGGTGA AACAGGTGCT 84 0 

CCTGGATTAA AGGGTGAAAA TGGTCTTCCA GGCGAAAATG GAGCTCCTGG ACCCATGGGT 90 0 

CCAAGAGGGG CTCCTGGTGA GCGAGGACGG CCAGGACTTC CTGGGGCTGC AGGTGCTCGG 96 0 

GGTAATGACG GTGCTCGAGG TAACAGAGGT GAAAGAGGAT CTGAGGGCTC CCCAGGCCAC 102 0 

CCAGGGCAAC CAGGCCCTCC TGGACCTCCT GGTGCCCCTG GTCCTTGCTG TGGTGGTGTT 108 0 

GGAGCCGCTG CCATTGCTGG GA7TGGAGGT GAAAAAGCTG GCGGTTTTGC CCCGTATTAT 114 0 

GGAGATGAAC CAATGGATTT CAAAATCAAC ACCGATGAGA TTATGACTTC ACTCAAGTCT 12 0 0 

GTTAATGGAC AAATAGAAAG CCTCATTAGT CCTGATGGTT CTCGTAAAAA CCCCGCTAGA 126 0 

AACTGCAGAG ACCTGAAATT CTGCCATCCT GAACTCAAGA GTGGAGAATA CTGGGTTGAC 132 0 
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CCTAACCAAG GATGCAAATT GGATGCTATC 
ACATGCATAA GTGCCAATCC TTTGAATGTT 
GCTGAGAAGA AACACGTTTG GTTTGGAGAG 
GGCAATCCTG AACTTCCTGA AGATGTCCTT 
TCCAGCCGAG CTTCCCAGAA CATCACATAT 
CAGGCCAGTG GAAATGTAAA GAAGGCCCTG 
AAGGCTGAAG GAAATAGCAA ATTCACCTAC 
ACTGGGGAAT GGAGCAAAAC AGTCTTTGAA 
ATTGTAGATA TTGCACCCTA TGACATTGGT 
GGCCCTGTTT GCTTTTTATA A 
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AAGGTATTCT GTAATATGGA AACTGGGGAA 13 80 

CCACGGAAAC ACTGGTGGAC AGATTCTAGT 144 0 
TCCATGGATG GTGGTTTTCA GTTTAGCTAC 15 00 

GATGTGCAGC TGGCATTCCT TCGACTTCTC 1560 
CACTGCAAAA ATAGCATTGC ATACATGGAT 162 0 

AAGCTGATGG GGTCAAATGA AGGTGAATTC 16 8 0 

ACAGTTCTGG AGGATGGTTG CACGAAACAC 174 0 

TATCGAACAC GCAAGGCTGT GAGACTACCT 18 00 

GGTCCTGATC AAGAATTTGG TGTGGACGTT 1860 

1881 
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CLAIMS 



L A molecule comprising at least a first moiety having the activity of a 

procollagen C-propeptide and a second moiety selected from any one of the group of an 
alien collagen a-chain and non-collagen materials, the first moiet\' being attached to the 
second moiety. 

2. A molecule according to claim 1 wherein the first moiety comprises an 
existing C-propeptide or a molecule resulting from partial modification thereof or an 
analogue thereof. 

3 . A molecule according to claim 2 wherein the existing C-propeptide is selected 
from any one of the group of the proa 1(1), proa2(I), proai(II), proal(III), proal(V), 
proa2(V), proa3(V), proa 1 (XI), proa2(XI), and proa3(XI) proa chain C-propeptides. 

4. A molecule according to claim 1 wherein the first moiety comprises a novel 
C-propeptide. 

5. A molecule according to claim 4 wherein the C-propeptide comprises a C- 
propeptide substituted at the recognition site. 

6. A molecule according to claim 5 wherein the C-propeptide has been 
substituted at the recognition site with the recognition sequence of an existing C-propeptide 
or a partially modified form thereof or an analogue thereof. 

7. A molecule according to claim 6 wherein the C-propeptide has been 
substituted at the recognition site with the recognition sequence of the C-propeptide of any 
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one of the group of proal(III), proa 1(1), proa2(I), proa 1(11), proal(V), proa2(V), 
proa 1 (XI) and proa2(XI) proa chains. 

8, A molecule according to claim 7 wherein the C-propeptide has been 
substituted at the recognition site with a recognition sequence having the sequence of any 
one of the group of SEQ ID NOs: 6-13. 

9. A molecule according to claim 5 wherein the C-propeptide is substituted at 
the recognition site with a novel recognition sequence. 

10- A molecule according to any one of claims 2-9 wherein the C-propeptide 
and/or the recognition sequence is that of a fibrillar proa chain. 

11- A molecule according to any one of the preceding claims wherein the second 
moiety comprises at least a collagen a-chain. 

12. A molecule according to claim 1 1 wherein the collagen a-chain is selected 
from any one of the group of proa 1(1) chain, proa2(I) chain, proa 1(11) chain, proa 1 (III) 
chain, proa 1(V) chain, proa2(V) chain, proa3(V) chain, proa 1 (XI) chain, proa2(XI) chain, 
and proa3(XI) chain collagen a-chains. 

13. A molecule according to any one of the preceding claims wherein the second 
moiety also comprises a proa chain N-propeptide. 

14. A molecule accordmg to claim 13 wherein the N-propeptide is selected from 
any one of the group of the proa 1 (I), proa2(I), proa 1 (II), proa 1 (III), proa 1 (V), proa2( V), 
proa3(V), proa 1 (XI), proa2(XI). and proa3(XI) proa chain N-propeptides. 
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15. A molecule according to claim 14 wherein it comprises a first moiety having 
the activity of the proal(III) C-propeptide attached to a second moiety comprising the 
collagen a-chain and N-propeptide of the proa2(I) chain. 

16. A molecule according to claim 13 wherein it has the sequence of SEQ ID NO: 
4. 

17. A molecule according to any one of the preceding claims wherein the first and 
second moieties are separated by intervening amino acid residues. 

18. A collagen molecule comprising a non-natural combination of collagen a- 
chains. 

19. A collagen fibril comprising collagen molecules according to claim 18. 

20. A collagen fibre comprising collagen fibrils according to claim 19. 

21. A molecule according to any one of the preceding claims for use in a method 
of treatment or diagnosis of the human or animal body. 

22. A molecule according to claim 21 for use in the treatment of procollagen 
suicide. 

23 . A molecule according to claim 2 1 for use as an adhesive or an implant. 

24. A molecule according to claim 21 wherein it is for use in promoting the 
healing of wounds or fibrotic disorders with reduced scarring. 
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^ molecule according to claim 21 wherein it is for use in promoting the 
healing of chronic wounds. 

26. A molecule according to any one of claims 1 to 20 for use in photography, 

brewing, foodstuffs, textiles or adhesives. 

^ method of treatment or diagnosis of the human or animal body comprising 
the use of a molecule according to any one of claims 1 to 25. 

28. DNA encoding a molecule according to any one of claims 1-20. 

29. An expression host transformed or transfected with DNA according to claim 
28 operably linked to regulatory sequences sufficient to direct expression. 

^ transgenic animal whose genome comprises DNA according to claim 29 
operably linked to regulator\' sequences sufficient to direct expression. 

^ ^ ■ ^ transgenic animal according to claim 30. the animal being a non-human 

placental mammal and the regulatory sequences directing expression in the milk of an adult 
female. 

^ method of producing a non-natural collagen, the method comprising 
harvesting the collagen from an expression host according claim 29 or a transgenic animal 
according to claim 30 or 3 1 and optionally subsequently purifying the non-natural collagen. 
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Figure 1 
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alphal (I) 
alpha2 (I) 
alphal (III) 



alphal (I) 
alpha2 (I) 
alphal (III) 



alphal (I) 
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alphal (III) 
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CP 
CP 

CP A 1 

YYRADD...A NWRDRDLEV DTTLKSLSQQ lENIRSPEGS RKNPARTCRD 

FYRADQPRSA PSLRPKDYEV DATLKSLNNQ lETLLTPEGS RKNPARTCRB 

YYGDE. . . .P MDFKINTDEI MTSL.KSVNGQ lESLISPDGS RKNPARNCRD 

#- -_###^- # ## - ###### ### 



2 

LKMCHSDWKS 
LRLSHPEWSS 
LKFCHPELKS 



GEYWIDPNQG 
GYYWIDPNQG 
GEYWVDPNQG 
# ##-##### 



3 4 
CNLDAIKVFC 
CTMEAIKVYC 
CKLDAIKVFC 



F 5 
NMETGETC^^ 
DFPTGETCIR 
NMETGETCIS 
- #####- 



PTQPSVAQKN 
AQPENIPAKN 
ANPLNVPRKH 



B C 

WYISKNPKDK RHVWFGESMT DGFQFEYGGQ GSDPADVAIQ LTFLRLMSTE 

WYRS.,SKDK KHVWLGETIN AGSQFEYNVE GVTSKEMATQ LAFMRLLANY 

WW.TDSSAEK KHVWFGESMD GGFQFSYGNP ELPEDVLDVQ LAFLRLLSSR 



G 6 
ASQNITYHCK 
ASQNITYHCK 
ASQNITYHCK 



NSVAYMDQQT 
NSIAYMDEET 
NSIAYMDQAS 



GNLKKAI^LLK 
GNLKKAVILQ 
GNVKKALKLM 



GSNEIEIRAE 
GSNDVELVAE 
GSNEGEFKAE 



GNSRFTYSVT 
GNSRFTYTVL 
GNSKFTYTVL 
###-###-# 



7 e 

VDGCTSHTGA WGKTVIEYKT TKTSRLPIID VAPLDVGAPD QEFGFDVGPV CFL 

VDGCSKKTNE WGKTIIEYKT NKPSRLPFLD lAPLDIGGAD HEFFVDIGPV CFK 

EDGCTKKTGE WSKTVFEYRT RKAVRLPIVD lAPYDIGGPD QEFGVDVGPV CFL 

- #.##^-##-# # ###--# -## #-### ## 
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Fi gu re 3 

alpha 1(1) GGQGSDPADV AIQLTFLRl^ STE 

alpha 2{I) NVEGVTSKEM ATQLAFMRLL ANY 

alpha 1(11) GDDNUVPNTA NVQMTFLRLL STE 

alpha l(III) GNPELPEDVL DVQLAFLRLL SSR 

alpha 1(V) VDAEGNPVGV . VQMTFLRLL SAS 

alpha 2 (V) GDHQSPNTAI . TQMTFLRLL SKE 

alpha 1(XI) LDVEGNSINM . VQMTFLKLL TAS 

alpha 2 (XI) VDSEGSPVG\^ . VQLTFLRLL SVS 
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Figure 4 



Lane 



66kDa — 



45kDa _ 



29kDa 



reducing 



1 2 3 4 5 




non-reducing 
6 7 8 9 




— trimers 

— dimers 



Lane 



procollagen construct translated 



2 and 6 

3 and 7 

4 and 8 

5 and 9 



a1(lll)Al 
a2(l)Al 
a2(l):{lil)CP 
a1{l!l):(l)CP 



Lane 1 :- molecular weight markers 
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FigureS 

reducing non-reducing 
LANE 123456 789 10 11 



66kDa 



45kDa 




Lane procollagen construct translated 

2 and 7 A-join 

3 and 8 F-join 

4 and 9 B-join 

5 and 10 C-join 

6 and 11 recip-C-join 



Lane 1 :- molecular weight markers 
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Figure G 

I 



Lane 12 3 4 5 6 7 



protease- 

resistant 

fragment 




Lane dipyridyl present procollagen translated 

2 No proa2(l):(lll)CP 

3 Yes proa2(l):(m)CP 

4 No BGR 

5 Yes BGR 

6 No proa1(III):(i)CP 

7 Yes proa1(lll):(i)CP 



Lane 1 :- molecular weight markers 
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Figure 7 



66kDa — 



reducing 



LANE 1 2 3 



non-reducing 



4 5 6 




trimers 



dimers 



monomers 



Lanes procollagen construct translated 



BGR ^ 
BGR 



1 and 4 

2 and 5 

3 and 6 BGR' 

SUBSTITUTE SHEET (RULE 26) 



.L-M 
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12 3 4 



66kDa — — 



43kDa — 



29kDa — 



18.4kDa. 




lane procollagen construct 



2 
3 



BGR®-= 
BGR 



BGR 



L-M 



Lane 1 : molecular weight markers 
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